Who Are We Listening to? Detecting User-generated Content (UGC) on the Web

نویسندگان

  • Marc Egger
  • André Lang
  • Detlef Schoder
چکیده

The analysis of text-based user-generated content (UGC) on the Web has become one highly acclaimed topic in recent years both in theory and practice. As users are able to participate and publicly comment on almost any webpage nowadays, UGC occurs scattered across the web and mixes with various content types such as advertising texts, product descriptions or other editorial articles. Holistic research that aims to listen to the voice of the consumer therefore needs to separate UGC from non-UGC. Unfortunately the UGC characteristic is not a directly observable attribute of content. As the amount of public available textual data on the web is vast and increases rapidly, manual classification is not applicable in this "big data" environment. From this, the previously unmet need emerges to perform UGC classification automatically, for which we provide three contributions. First, we show that UGC incorporates signals that enable humans to context-free decide whether a text has been written by another user. Second, we show that these signals can be utilized by supervised machine learning to perform UGC classification automatically. Third, we demonstrate and evaluate the fundamental feasibility of our approach on a dataset of German language web texts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Service Independent Access Control Architecture for User Generated Content (UGC) and Its Implementation

Using Web-based content management systems such as Blog, an end user can easily publish User Generated Content (UGC). Although publishing of UGCs is easy, controlling access to them is a difficult problem for end users. Currently, most of Blog sites offer no access control mechanism, and even when it is available to users, it is not sufficient to control users who do not have an account at the ...

متن کامل

Service Independent Access Control Architecture for User Generated Content (UGC)

Using Web-based contents management systems such as Blog, an end user can easily publish User Generated Content (UGC). Although publishing of UGCs is easy, controlling access to them is a difficult problem for end users. Currently, most of Blog sites offer no access control mechanism, and even when it is available to users, it is not sufficient to control users who do not have an account at the...

متن کامل

Collaborative recommendation with user generated content

In the age of Web 2.0, user generated content (UGC), such as user review and social tag, ubiquitously exists on the Internet. Although there exist different kinds of UGC in recommender systems, the existing works only studied a single kind of UGC in each of their papers. Thus, the previous works lose a chance to uncover the similar effects of different kinds of UGC in recommender systems. In th...

متن کامل

Learning to Recommend with User Generated Content

In the era of Web 2.0, user generated content (UGC), such as social tag and user review, widely exists on the Internet. However, in recommender systems, most of existing related works only study single kind of UGC in each paper, and different types of UGC are utilized in different ways. This paper proposes a unified way to use different types of UGC to improve the prediction accuracy for recomm...

متن کامل

Detecting Public Influence on News Using Topic-Aware Dynamic Granger Test

With the rapid proliferation of Web 2.0, user-generated content (UGC), which is formed by the public to reflect their views and voice, presents rich and timely feedback on news events. Existing research either studies the common and private features between news and UGC, or describes the ability of news media to influence the public opinion. However, in the current highly media-user interactive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015